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ABSTRACT 


Public surveillance videos are increasingly playing key role in identification of certain incidents and people 
who misbehave or perform illegal activities. Monitoring surveillance videos manually to detect abnormalities 
is time consuming and it may lead to delay in getting required information. With the usage of Artificial 
Intelligence (AI) video analytics in real time can help in acquiring such information on time so as to make 
well informed decisions. Particularly deep learning is great help in learning from incidents and detect 
anomalous behaviours. In this study, we suggested an autonomous system for anomaly detection from 
surveillance films, based on deep learning. For anomaly detection, an improved Convolutional Neural 
Network (CNN) model is employed. We presented a method that utilizes the upgraded CNN model for its 
functionality, called Learning based Video Anomaly Detection (LbVAD). To lower the prediction process's 
error rate, a loss function is defined. For our empirical investigation, we gathered data from many benchmark 
datasets, including UMN, UCSD, Ped1, and Ped2. The suggested approach works better than the current 


models, according to the results of our experiments. 


Keywords: Machine Learning, Deep Learning, Artificial Intelligence, Video Abnormality Detection 


1. INTRODUCTION 


In the contemporary era, there have been 
increasing incidents in public places pertaining to 
human misbehaviour, traffic accidents, fire 
accidents and so on. When such mishaps occur, it 
is very important to establish evidence of events. 
Towards this end, public surveillance videos play 
crucial role as they stream video content 
continuously. However, it is very important to 
analyse the videos and identify incidents that look 
abnormal [1]. Towards this end _ traditional 
approach of human observation of videos is very 
time consuming and leads to delay in making 
decisions. Artificial Intelligence-enabled 
techniques were developed to solve this issue. 
Particularly deep learning models are widely used 
for image processing. Models like CNN are found 
to be more suitable for dealing with image content 


[2]. Automatic detection of video abnormalities 
and notification to concerned authorities is very 
important for public video surveillance to be very 
useful. Towards this end, many researchers 
contributed in developing learning based 
approaches. Moin et al. [4] opined that 
surveillance cameras' data analysed for effective 
anomaly detection using deep learning 
techniques, enhancing accuracy and isolation for 
pre-training. Nawaratne et al. [9] proposed deep 
learning for evolving anomalies in real-time 
surveillance. Active learning updates anomalies, 
addressing dynamic challenges. Gamarra et al. 
[14] shown improved performance using a new 
IVADC-FDRL model that was suggested for 
anomaly detection and_ classification in 
surveillance footage. Attar et al. [17] showed 
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promise for various domains such as health care 
and self-driving cars, but challenges remain for 
real-time anomaly detection and parallel deep 
learning architectures. Vu et al. [22] presented a 
multi-channel system for supervised finding 
anomalies in CCTV _ footage, outperforming 
existing methods. Next steps include developing 
an end-to-end model and investigating new 
datasets. Asad et al. [26] suggested a two-stage 
design for anomaly identification in surveillance 
videos, emphasizing spatiotemporal features and 
utilizing deep learning. Hussein et al. [30] 
investigated on the importance of human 
behaviour recognition that has led to increased 
focus on anomaly detection. From the review of 
literature, It is discovered that a deep learning- 
based framework is required in order to enhance 
the detection procedure. The following are our 
contributions to this publication. 


1. Our deep learning-based framework was 
suggested for automatically detecting 
abnormalities from surveillance videos. 

2. An enhanced Convolutional Neural 
Network (CNN) model is used for 
detecting abnormalities. 

3. We put out a learning-based system that 
Video Anomaly Detection (LbVAD) 
which exploits the enhanced CNN model 
for its functionality. 

4. A program is designed to assess LbVAD 
algorithm and compare its performance 
with existing models. 


This is the format for the rest of the paper. Section 
2 examines the research on a number of methods 
available based on deep learning. Section 3 
presents our methodology for video anomaly 
detection. The study's findings are shown in 
Section 4. In addition to providing room for 
further research, Section 5 wraps up our study and 
offers insightful observations. 


2. RELATED WORK 


This section examines the research done on 
existing video abnormality detection methods. 
Nayak et al. [1] observed that video surveillance 
widely used in public places for safety. 
Challenges in anomaly detection due to varied 
factors and lack of research. Aberkane et al. [2] 
explored deep learning combined’ with 
reinforcement learning that detects anomalies in 
surveillance videos, addressing computational 
cost for improved efficiency. Kiran et al. [3] 
investigated surveillance videos that lack 
annotations, necessitating unsupervised anomaly 
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detection. Categories include models that are 
generative, spatiotemporal predictive, and 
reconstruction-based. Moin et al. [4] opined that 
surveillance cameras' data analysed for effective 
anomaly detection using deep learning 
techniques, enhancing accuracy and isolation for 
pre-training. Amin et al. [5] studied EADN model 
that addresses surveillance complexity with CNN 
and LSTM for precise anomaly detection in 
surveillance data analysis. Shah et al. [6] proposed 
deep learning model detects anomalies, 
outperforming existing methods on challenging 
real-world datasets. Rezaee et al. [7] proposed 
automated detection of crowd anomalies that aids 
in effective security. Methods involve crowd 
analysis, tracking, and deep learning. Mohan et al. 
[8] found that public places increasingly utilize 
video surveillance for security. Anomaly 
detection combines PCANet and CNN for 
accurate recognition and location. 


Nawaratne et al. [9] proposed deep learning for 
evolving anomalies in real-time surveillance. 
Active learning updates anomalies, addressing 
dynamic challenges. Singh et al. [10] proposes a 
DNN-based method for identifying anomalies in 
CCTV footage, performing comparably with 
simpler complexity. Chriki et al. [11] introduced 
deep characteristics and manually developed 
algorithms for anomaly identification in UAV- 
based surveillance operations. Doshi et al. [12] 
proposed online anomaly detection using transfer 
and continual learning, enhancing surveillance 
capabilities for dynamic scenarios. Shao et al. [13] 
invented a powerful deep learning framework that 
detects anomalies in videos and offers answers, 
demonstrating comparable performance. Gamarra 
et al. [14] presented a brand-new IVADC-FDRL 
model for the identification and categorization of 
anomalies in surveillance footage, demonstrating 
superior performance. Amudha et al. [15] 
observed that it might be difficult to identify 
anomalies in video surveillance because dynamic 
environments. The proposed deep learning model 
efficiently predicts anomalies, surpassing others. 
Shen et al [16] proposed Spatial-Temporal Fusion 
Features (STFF) in a Fast Sparse Coding Network 
(FSCN) improves video abnormality 
identification. The FSCN efficiently generates 
sparse coefficients, surpassing traditional 
methods. Attar et al. [17] showed promise for 
various domains such as health care and self- 
driving cars, but challenges remain for real-time 
anomaly detection and parallel deep learning 
architectures. Yu et al. [18] proposed human- 
machine cooperative approach for video anomaly 
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detection incorporates expert feedback, 
improving anomaly classification. Experiment 
results show competitive performance, warranting 
further research for computational efficiency 
improvement. 
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Table 1: Shows Summary Of Important Literature Findings 


References | Methods Dataset 


Advantages 


Limitations 


2] DL and | UCF10 and 
ae HMDBS51 


High accuracy 


Reduces training time Anomaly 
needs to be done. 


Computational cost is 
more 
localization 


UCSDpPet1 and High accuracy Attention based DL is to 
UCSDPet2 be explored. 


UCSD Pedestrian | Can 
and CUHK Avenue i 


a ee 


[12] CNN and | CUHK avenue, 
i 
DRL are to be achieved. 
CUHK Avenue accurac ze reduce’. 
in 


handle high | False negatives are to be 


dimensional data with | reduced. 
better performance. 


and acceptable | More diversified datasets 
detection are to be used 


Learning 
leading to better accuracy | more 


efficiency | To be evaluated with 
challenging 
scenarios 


evaluated 


[22] Avenue, Ped1 Improved PSNR in the 
detection results ee dataset 


[26] Pedl, Ped2, CUHK | Modeling 
ak i 


Mansour et al. [20] introduced IVADC-FDRL, a 
sophisticated approach for identifying and 
classifying anomalies in surveillance footage. The 
prototype leverages Faster R-CNN and DQL for 
accurate detection, demonstrating superior 
performance on the UCSD dataset. Boudihir et al. 
[21] introduced a Deep Q Learning Network to 
identify and pinpoint irregularities in security 
footage. Experimental results demonstrate 
superior performance. Future work aims _ to 
minimize computational costs. Vu et al. [22] 
presented a multi-channel system for guiding 
identification of anomalies in surveillance footage 
outperforming existing methods. Future work 
involves exploring to build a complete model and 
adding fresh datasets. Lian et al. [23] offered a 
TSC framework for identifying anomalies, 
optimizing parameters and using data-dependent 
similarity | measurements for improved 
performance. Furthermore, it introduces a 
comprehensive dataset to support the proposed 
approach. Nabi et al. [24] suggested an anomaly 


Needs improvement in 


characterization improved | terms of segmentation 


and tracking. 


detection technique based on GANSs for crowded 
scenes, outperforming existing approaches in 
various evaluation tasks. Future work will explore 
alternative motion representation methods for 
improved performance. Lopes et al. [25] proposed 
approach combines various features for anomaly 
scoring, validated by public dataset experiments. 
Further research is needed. Asad et al. [26] 
suggested a two-stage architecture for anomaly 
identification in surveillance videos, emphasizing 
spatiotemporal features and utilizing deep 
learning. 


Yuan et al. [27] introduced an innovative 
autoencoder architecture to capture appearance 
and motion regularities separately. Enhanced by a 
variance attention module and deep K-means 
clusters, the method showcases _ superior 
performance on various datasets. Hoang et al. [28] 
found that deep learning is essential for anomaly 
detection in video surveillance and human 
behaviour recognition. Various methods, 


a, eee eee 


Journal of Theoretical and Applied Information Technology 
15" May 2024. Vol.102. No 9 


SZ 


© Little Lion Scientific 


ISSN: 1992-8645 


including reconstruction-based and classification 
techniques, have significantly advanced the field. 
Benchmark databases assist in addressing 
challenges related to robust feature extraction in 
dynamic environments. Lee et al. [29] explored 
anomaly detection that remains an attractive area 
for study because of its complexities and limited 
normal data availability. Recent methods are 
surveyed, considering network architectures and 
datasets from 2015 to 2018. Hussein et al. [30] 
investigated on the importance of human 
behaviour recognition that has led to increased 
focus on anomaly detection. This comprehensive 
review discusses DL methods, architectures, 
datasets, and performance metrics in video AD. 
Applications and challenges are also highlighted 
for future research. Table 1 shows summary of 
important literature findings. From the review of 
research, a deep learning-based framework is 
shown to be necessary, that could improve 
detection process. 


3. METHODOLOGY 


This section presents our methodology for 
identifying anomalies in videos using supervised 
learning. During training, the proposed method, 
shown in, starts by dividing up security film into 


Anomaly video 


MOL Ranking loss with sparsity and 
smoothness constramts. 
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a preset number of parts. These sections provide 
examples in a bag. We train the anomaly detection 
model using both positive (anomalous) and 
negative (normal) bags using the proposed deep 
MIL ranking loss. The following optimization 
expressed in Eq. | is used to train the classifier in 
conventional supervised support vector machine 
classification tasks where the labels for 
everything, both good and bad samples are 
known. 


min? Dis max (0,1 — y,;(w. @(x) — b)) +2 | 
(1) 


The hinge loss is represented by y_i, the label of 
each sample is denoted by w, the classifier to be 
learnt is represented by \(x), which indicates the 
features extracted from a video clip or an image 
patch. A bias is represented by b. Good and 
negative example annotations are required in 
order to train a robust classifier. It is imperative 
that each video segment have temporal 
annotations for a classifier in the context of 
supervised anomaly detection. Time-consuming 
and hard work, however, goes into getting 
temporal annotations for videos. 


w ||? 


Figure 1: Overview of the proposed methodology 
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The requirement of possessing these precise 
temporal annotations is loosened by MIL. The 
exact time positions such unusual occurrences in 
films is not known in MIL. Rather, all that is 
required are video-level labels showing the 
existence of an abnormality throughout the whole 
film. Videos with anomalies are classified as 
positive; those with none at all are classified as 
negative. Next, A positive bag B_a is used to 
symbolize a good video, in wherein discrete time 
intervals make up individual occurrences, 
(gh ere p™) where m denotes how many 
occurrences there are in the bag. We presume that 
the abnormality is present in at least one of these 
cases. Similar to this, we describe the negative 
video as a negative bag, B_n, whose temporal 
segments create negative instances 
TE wai n™) Not a single occurrence in the 
negative bag has an abnormality. Given the 
uncertainty surrounding the precise details (i.e., 
instance-level label) the goal function may be 
optimized with respect to the highest scoring 
instance inside each bag, as demonstrated by the 
positive examples [31]. It is expressed as in Eq. 2. 


mty2 = )) 
min - yija1 Max (o, 1 Ye, (maxcw. 0(x;)) 


6) ) tne 


where Ye, indicates he label at the bag level; z is 


(2) 


the total number of bags; the remaining variables 
are equivalent to those found in Eq. 1. Aberrant 
behaviour is difficult to accurately describe since 
it is very subjective and differs widely from 
person to person. [32]. Moreover, the process of 
giving 1/0 labels to anomalies is not clear-cut. 
Additionally, rather than being a classification 
challenge, anomaly detection is usually handled 
as a low probability pattern identification issue 
since there aren't enough instances of anomalies 
[33]. In our proposed technique, we structure 
anomaly identification as a regression issue. The 
goal is to obtain greater points for anomalies in the 
video portions that don't seem right for the typical 
portions. The simplest method would be to 
implement a ranking loss that encourages atypical 
video sequences to receive higher scores than 
conventional segments, as indicated in Eq. 3. 


f(Vq) > (Un), (3) 


where v, and v, depict both typical and aberrant 
video fragments, f(v,) and f(v,) indicate the 
corresponding expected anomaly scores, which 
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are between 0 and 1. When training, if the 
segment-level annotations are known, then the 
previously specified ranking algorithm should 
perform well. However, using Eq. 3 is not viable 
when annotations at the video segment level are 
not present. Instead, we propose an objective 
function for multiple instance ranking, which is 
expressed in Eq. 4. 


(4) 


max f(vi) > max f(v}), 
1€Bq 1EByn 
where each bag's maximum is applied to every 
video section. Rather of applying ranking to every 
bag instance, we only apply ranking to the two 
examples in both the positive and negative bags 
that, respectively, have the greatest anomaly 
scores. The real positive instance, or the section 
that has the highest anomaly score in the positive 
bag is probably the anomalous one. Though it 
appears most like an abnormal segment, which is 
really a regular occurrence, is the segment in the 
negative bag with the greatest anomaly score. 
This negative situation is considered a hard 
occurrence in anomaly detection that might result 
in a false warning. The goal of applying Eq. 4 is 
to increase the difference between the positive and 
negative examples' anomaly scores. Thus, the 
hinge-loss formulation gives us the ranking loss 
expressed in Eq. 5. 


(B,, B,) = max(0, 1 — max f(v') + max f(v})). 
1€Bq i€By 
(5) 


The fact that the aforementioned loss overlooks 
the aberrant video's underlying temporal structure 
is one of its limitations. Initially, anomalies in 
real-world situations often last for a brief period 
of time. In this instance, there could just be a few 
portions that have the anomaly, as indicated by the 
sparse scores of the instances (segments) in the 
anomalous bag. Second, the anomaly score need 
to transition smoothly across video parts because 
the video is divided into pieces. Thus, by enforce 
temporal smoothness between anomaly scores of 
temporally adjacent video segments in order to 
minimize the difference in scores for surrounding 
video segments. The loss function is transformed 
by adding the smoothness and sparsity restrictions 
on the instance scores as expressed in Eq. 6. 


(Ba, By) = max(0, 1 — max f(vi) + max f(v))) + 
1€Bq 1EBn 


Ay ye @)) HIF) 4 Iw) 


(6) 
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where stands for the sparsity term and denotes the 
temporal smoothness term. In this MIL ranking 
loss, the error is back-propagated from the highest 
scored video portions in both positive and 
negative bags. After training on a large number of 
positive and negative bags, we expect the network 
to learn a generalized model to predict high scores 
for anomalous areas in positive bags. Lastly, the 
expression gives us the entire goal function that 
we have in Eq. 7 


L(w) = (Bg, Bn)+ Az Il w Ile (7) 


W stands for model weights in this instance. 
Every video is divided into an equal quantity of 
discrete temporal segments, which are then 
utilized as bag instances. We extract the 3D 
convolution features given each section of the 
movie [36]. We employ this feature representation 
because it is computationally efficient and clearly 
able to capture appearance and motion dynamics 
in the context of video action detection. 


Algorithm: Learning based Video Anomaly 
Detection (LbVAD) 
Input: Our training dataset of videos D, test 
video v 
Output: Anomaly detection results R and 
performance statistics P 

1. Begin 

2. Configure enhanced CNN model m 

3. Compile the model m 

4. F€ExtractFeatuers(v) 

5. (pbags, nbags)€Divide(F) 

6. Train m with D 

7. Save the model m 

8. R€DetectAbnormalities(m, pbags, 

nbags) 

9. P€Evaluation(R, ground truth) 

10. Display R 

11. Display P 

12. End 
Algorithm 1: Learning based Video Anomaly 

Detection (LbVAD) 


Our suggested method is dubbed Learning based 
Video Anomaly Detection (LbVAD) which 
exploits the enhanced CNN model for its 
functionality. A loss function is defined to reduce 
error rate in the prediction process. We collected 
dataset from different benchmark datasets such as 
UMN, UCSD, Ped1 and Ped2 for empirical study. 
Our algorithm is based on enhanced CNN. The 
given training data is used by the model to learn 
and gain knowledge. The given test video is 
subjected to feature extraction and dividing the 
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features into positive and negative bags. These are 
further given to learned model to perform 
abnormality detection. 


4. EXPERIMENTAL RESULTS 


This segment showcases experimental results of 
our methodology. Python 7 and libraries like 
Keras and Tensorflow are used for developing an 
improved CNN model used as part of the 
proposed framework. A loss function is defined to 
reduce error rate in the prediction process. We 
collected dataset from several benchmark 
datasets, including Ped1 and Ped2, UMN, UCSD, 
and others, for empirical research. 
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Figure 2: Dataset Dynamics In Terms Number Of 
Videos And The Length 


As presented in Figure 2, the dataset we 
collected has been distributed into videos of 


different lengths. 
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Figure 3: Abnormality Detection Results Of 
Experiments 


As presented in Figure 3, different abnormal 
activities are detected by the very accurate system 
that has been suggested. Performance statistics are 
obtained by comparing the predictions with the 
ground truth. 


Table 2: Performance Comparison 


AUC 


Binary classifier | 75.45 

method 
80.60 

Method [35] 85.51 
92.79 
As shown in Table 2, the suggested method's 
performance is expressed in terms of AUC and 
contrasted with that of the current approaches. 


AUC (%) 
w 
jo) 


AUC (%) 
DETECTION METHOD 


@ Binary classifier method # Method in [34] 


m= Method [35] | Proposed method 


Figure 4: Performance Comparison Among Detection 
Models 
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As presented in Figure 4, the proposed method for 
automatic detection of video anomalies is 
measured in terms of AUC. Higher in AUC 
indicates better performance. It has been noted 
that the suggested approach showed highest 
performance due to its efficient internal 
processing and loss function. The baseline binary 
classifier showed 75.45% ACU, the method in 
[34] 80.60%, the method in [35] 85.51% while the 
proposed method exhibited 92.79% AUC. Based 
on the findings, it can be concluded that the 
suggested technique provides a reliable means of 
automatically identifying anomalies in public 
surveillance footage. 


5. CONCLUSION AND FUTURE WORK 


Our proposal in this study was to use deep 
learning to framework for automatically detecting 
abnormalities from surveillance videos. For 
anomaly detection, an improved Convolutional 
Neural Network (CNN) model is employed. In 
our proposed technique, anomaly identification is 
formulated as a regression issue. The aim is to get 
higher anomaly scores for the anomalous than for 
the normal video parts. Applying a ranking loss 
that encourages high scores for aberrant video 
sections relative to typical segments would be the 
easiest approach. We presented a method that 
utilizes the upgraded CNN model for its 
functionality, called Learning based Video 
Anomaly Detection (LbVAD). To lower the 
prediction process's error rate, a loss function is 
defined. Ped1 and Ped2, UMN, UCSD, and other 
benchmark datasets are utilized in empirical 
research. According on the outcomes of our 
experiments, the suggested algorithm performs 
better than current models with 92.79% accuracy. 
Our goal is to enhance our framework in the future 
by including deep learning models based on the 
Generative Adversarial Network (GAN) 
architecture. 
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